Skip to content

Feat S3 Transfer Manager v2 GetObject/DownloadObject #2996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Mar 25, 2025

Conversation

wty-Bryant
Copy link
Contributor

Implement v2 s3 transfer manager's GetObject/DownloadObject api bound to single union client which mimics normal service client's initialization and api call. User could now download object sequentially with GetObject or asynchronously with DownloadObject.

Test: passed unit test for GetObject and DownloadObject, passed integration test of uploading object via s3 client and validating its content after downloading through v2 transfer manager.

@wty-Bryant wty-Bryant requested a review from a team as a code owner January 31, 2025 21:58
}

// Queue the next range of bytes to read.
ch <- getChunk{w: g.w, start: g.pos - g.offset, withRange: g.byteRange()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as I can tell, you're now doing concurrent downloads but you're buffering the entire object in memory before returning it to the caller - which is all well and good until someone goes to try and pull a multi-gigabyte object.

We're basically going to have to return a reader that "composes" the response bodies from inner getPart requests in order. That will allow you to stream the response back to the caller without that extra memory pressure. You might have a "scratch" buffer for that internal composition but naturally it won't be the full object size.

@wty-Bryant wty-Bryant force-pushed the feat-tmv2-getobject branch 2 times, most recently from f404d3f to 0ce3dca Compare March 14, 2025 22:23
// Deprecated: This field is handled inconsistently across AWS SDKs. Prefer using
// the ExpiresString field which contains the unparsed value from the service
// response.
Expires time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to address this now but I question whether we should have the time.Time version of this in the transfermanager. We should be using these APIs as an opportunity to get rid of old baggage like this.

Copy link
Contributor

@lucix-aws lucix-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed the nuances of exposing the GetObject reader offline, you're free to address that now or ship and address in a followup.

@@ -1,193 +0,0 @@
package testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, but what's the expected status of the old transfer manager? if we're not completely getting rid of it, do we still keep testing for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, we just leave it here and make any change if there's any checksum related update

}
},
},
"parts download with part number input": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually check that we're doing part object? It seems like if the data is small enough, this is the equivalent of just getting everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this test case verifies that we single download the whole object anytime user passes a part number

g.r.setCapacity(capacity)
}

if i == capacity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we hit this case, wouldn't we deadlock if g.r.getRead() != capacity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g.r.read would be increased by reader in another goroutine, that's why we continue the loop to re-check g.r.getRead() at line 668

@wty-Bryant wty-Bryant force-pushed the feat-tmv2-getobject branch 2 times, most recently from 3706344 to a3113eb Compare March 25, 2025 02:43
wty-Bryant and others added 13 commits March 24, 2025 22:43
* recommit transfer manager v2 files

* change pool to store slice pointer

* add integ test for putobject

* update go mod

* minor changes for v0.1.0

* update tags

* update tags

* update integ test dependency version

* change err var name

* update go mod

* change input/output type comment

* minor change

---------

Co-authored-by: Tianyi Wang <[email protected]>

rebase from main

rebase branch from main
move transfer manager v2 integration tests to within module

move putobject integ test to transfer manager module
@wty-Bryant wty-Bryant force-pushed the feat-tmv2-getobject branch from a3113eb to 684ca8e Compare March 25, 2025 02:44
@wty-Bryant wty-Bryant merged commit 6fa167a into main Mar 25, 2025
13 checks passed
@wty-Bryant wty-Bryant deleted the feat-tmv2-getobject branch March 25, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants